Semi-Automatic Annotation of Natural Language Vulnerability Reports

نویسندگان

  • Yan Wu
  • Robin A. Gandhi
  • Harvey P. Siy
چکیده

Those who do not learn from past vulnerabilities are bound to repeat it. Consequently, there have been several research efforts to enumerate and categorize software weaknesses that lead to vulnerabilities. The Common Weakness Enumeration (CWE) is a community developed dictionary of software weakness types and their relationships, designed to consolidate these efforts. Yet, aggregating and classifying natural language vulnerability reports with respect to weakness standards is currently a painstaking manual effort. In this paper, the authors present a semi-automated process for annotating vulnerability information with semantic concepts that are traceable to CWE identifiers. The authors present an information-processing pipeline to parse natural language vulnerability reports. The resulting terms are used for learning the syntactic cues in these reports that are indicators for corresponding standard weakness definitions. Finally, the results of multiple machine learning algorithms are compared individually as well as collectively to semi-automatically annotate new vulnerability reports. Semi-Automatic Annotation of Natural Language Vulnerability Reports

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Designing Controlled Natural Languages for Semantic Annotation

Manual semantic annotation is a complex and arduous task both time-consuming and costly often requiring specialist annotators. (Semi)-automatic annotation tools attempt to ease this process by detecting instances of classes within text and relationships between classes, however their usage often requires knowledge of Natural Language Processing(NLP) and/or formal ontological descriptions. This ...

متن کامل

Wide-Coverage Grammar Extraction from Thai Treebank

Parsing is an important step for natural language understanding, including phrase alignment for supporting statistical machine translation. Ability on analysing real text by parser strongly depends on grammar. Treebank could be one of the sources for grammar extraction. However, treebank construction largely relies on human annotators intuitions. Different intuitions from multiple annotators br...

متن کامل

OBA: Supporting Ontology-Based Annotation of Natural Language Resources

In this paper, we introduce OBA – an application for NLP-based annotation of natural language texts with ontology classes and relations. OBA provides support for different tasks required for semi-automatic semantic annotation. Among other things, it supports creating manual semantic annotations in order to enrich the set of lexical patterns, automatically annotating large corpora based on speci...

متن کامل

Semi-automatic compound nouns annotation for data integration systems

Lexical annotation is the explicit inclusion of the “meaning” of a data source element according to a lexical resource. Accuracy of semi-automatic lexical annotator tools is poor on real-world schemata due to the abundance of non-dictionary compound nouns. It follows that a large set of relationships among different schemata is discovered, including a great amount of false positive relationship...

متن کامل

Fast semi-automatic semantic annotation for spoken dialog systems

This paper describes a bootstrapping methodology for semi– automatic semantic annotation of a “mini–corpus” that is conventionally annotated manually to train an initial parser used in natural language understanding (NLU) systems. We propose to cast the problem of semantic annotation as a classification problem: each word is assigned a unique set of semantic tag(s) and/or label(s) from the univ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJSSE

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2013